Integrating Encyclopedic Knowledge into Neural Language Models
نویسندگان
چکیده
Neural models have recently shown big improvements in the performance of phrase-based machine translation. Recurrent language models, in particular, have been a great success due to their ability to model arbitrary long context. In this work, we integrate global semantic information extracted from large encyclopedic sources into neural network language models. We integrate semantic word classes extracted from Wikipedia and sentence level topic information into a recurrent neural network-based language model. The new resulting models exhibit great potential in alleviating data sparsity problems with the additional knowledge provided. This approach of integrating global information is not restricted to language modeling but can also be easily applied to any model that profits from context or further data resources, e.g. neural machine translation. Using this model has improved rescoring quality of a state-of-the-art phrase-based translation system by 0.84 BLEU points. We performed experiments on two language pairs.
منابع مشابه
Linking Domain-Specific Knowledge to Encyclopedic Knowledge: an Initial Approach to Linked Data
Linked Data creates a shared information space by publishing and connecting resources in the Semantic Web. However, the specification of semantic relationships between data sources is still a stumbling block. One solution is to enrich ontologies with multilingual and concept-oriented information. Usefully linking entities in the Semantic Web is thus facilitated by a semantic-oriented cross-ling...
متن کاملTowards Model Driven Architectures for Human Language Technologies
Developing multi-purpose Human Language Technologies (HLT) pipelines and integrating them into the large scale software environments is a complex software engineering task. One needs to orchestrate a variety of new and legacy Natural Language Processing components, language models, linguistic and encyclopedic knowledge resources. This requires working with a variety of different APIs, data form...
متن کاملMASAQ: A Multi-Agent System for Answering Questions Based on an Encyclopedic Knowledge Base1
In this paper, we present a multi-agent system, called MASAQ, for answering users’ queries based on an encyclopedic knowledge base. MASAQ has three major components: (1) a natural language interface; (2) an executable specification language (EASL) for developing multi-agent systems for answering or reasoning about users’ queries; (3) an encyclopedic knowledge base covering twenty-one domains. I...
متن کاملSimulated Action in an Embodied Construction Grammar
Various lines of research on language have converged on the premise that linguistic knowledge has as its basic unit pairings of form and meaning. The precise nature of the meanings involved, however, remains subject to the longstanding debate between proponents of arbitrary, abstract representations and those who argue for more detailed perceptuo-motor representations. We propose a model, Embod...
متن کاملNeural Network Language Model for Chinese Pinyin Input Method Engine
Neural network language models (NNLMs) have been shown to outperform traditional ngram language model. However, too high computational cost of NNLMs becomes the main obstacle of directly integrating it into pinyin IME that normally requires a real-time response. In this paper, an efficient solution is proposed by converting NNLMs into back-off n-gram language models, and we integrate the conver...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016